Skip to content

Comments

[WaveTransform] Generate non-SSA Exec mask manipulation instrs#789

Merged
lalaniket8 merged 11 commits intoamd-feature/wave-transformfrom
amd/dev/lalaniket8/non-ssa-compliant-wave-transform
Feb 2, 2026
Merged

[WaveTransform] Generate non-SSA Exec mask manipulation instrs#789
lalaniket8 merged 11 commits intoamd-feature/wave-transformfrom
amd/dev/lalaniket8/non-ssa-compliant-wave-transform

Conversation

@lalaniket8
Copy link

@lalaniket8 lalaniket8 commented Dec 8, 2025

Since we are moving Wave Transform to the middle of Register Allocation after PHI-elimination, the Exec Mask Manipulation instructions added to the code by Wave Transform should not be in SSA.
This PR contains code changes to support this.
We remove SSAUpdater originally used and used a single Accumulator Register to capture contributions from Thread-level CFG predecessors of a basic block. This Accumulator is used to set the appropriate EXEC mask. The Reset Flag Semantics of GCNLaneMaskUpdater is retained and used to reset the Accumulator at correct points in the code.

@z1-cciauto
Copy link
Collaborator

Failed to trigger build:

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@lalaniket8 lalaniket8 changed the title Wave Transform should generate non SSA Exec mask manipulation instrs Wave Transform to generate SSA Exec mask manipulation instrs Dec 9, 2025
@lalaniket8 lalaniket8 marked this pull request as ready for review December 9, 2025 05:20
Copy link

@cdevadas cdevadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you done the clang-format? Felt like at some places the format wasn't good.

@z1-cciauto
Copy link
Collaborator

Failed to trigger build:

@lalaniket8
Copy link
Author

Have you done the clang-format? Felt like at some places the format wasn't good.

Addressed in latest commit

@cmc-rep
Copy link

cmc-rep commented Dec 9, 2025

I have started reviewing the code change . In the meantime,
We have provided multiple tests under llvm/test/CodeGen/AMDGPU, could we update those tests?

  • For those ll files, we want to add the run-line to stop after wave-transform, and check generated MIR.
  • For those MIR files, manually update them to Non-SSA form, and run wave-transform pass.

Actually, for those ll files, I would suggest that we first have a separate PR to add those run-line and check-result showing what the MIR look like right before wave-transform. Hopefully, those tests are easy to add, they can get merged before this PR.
This PR then will update those tests with the result after wave-transform. This way, we can compare the MIR before and after wave-transform during code review.

@cmc-rep
Copy link

cmc-rep commented Dec 9, 2025

Also please make sure all the code comments are up to date with the code changes. For example, any comment mentioning PHI node is likely out of date.

Also I feel thtat we should clean up LaneMaskUtil code that does not really get used. For example, for our application, we always assume accumulating == true. If we are not going to maintain the code that assumes accumulating == false, we may want to delete them. I personally would prefer getting the code as simpler as possible

@cdevadas
Copy link

I have started reviewing the code change . In the meantime, We have provided multiple tests under llvm/test/CodeGen/AMDGPU, could we update those tests?

I thought about that initially. Once this patch gets merged, the next patch will be to enable wave-transform by default, and that would cover all lit tests in the new pipeline. At the moment, most lit tests would break if wave transform is force-enabled as the original implementation depends on the SSAUpdater and introduces PHI nodes. However, it makes sense to add some selected tests to verify the new wave-transform changes.

  • For those ll files, we want to add the run-line to stop after wave-transform, and check generated MIR.

Better to select some control-flow tests involving loops and if-else and stop-after wave-transform pass. @lalaniket8 can you identify some tests and pre-commit the new changes?

  • For those MIR files, manually update them to Non-SSA form, and run wave-transform pass.

Actually, for those ll files, I would suggest that we first have a separate PR to add those run-line and check-result showing what the MIR look like right before wave-transform. Hopefully, those tests are easy to add, they can get merged before this PR. This PR then will update those tests with the result after wave-transform. This way, we can compare the MIR before and after wave-transform during code review.

@lalaniket8
Copy link
Author

lalaniket8 commented Dec 10, 2025

Also please make sure all the code comments are up to date with the code changes. For example, any comment mentioning PHI node is likely out of date.

Should we also remove the SSAReconstructor class in AMDGPUWaveTransform.cpp since that is not needed anymore?

Also I feel thtat we should clean up LaneMaskUtil code that does not really get used. For example, for our application, we always assume accumulating == true. If we are not going to maintain the code that assumes accumulating == false, we may want to delete them. I personally would prefer getting the code as simpler as possible

Yes, I think it a good idea to remove the Default mode and keep only Accumulating mode, it will simplify the code a lot.
Should I have another PR for cleaning up this part, or a commit in this PR?

@cdevadas
Copy link

Also please make sure all the code comments are up to date with the code changes. For example, any comment mentioning PHI node is likely out of date.

Should we also remove the SSAReconstructor class in AMDGPUWaveTransform.cpp since that is not needed anymore?

Also I feel thtat we should clean up LaneMaskUtil code that does not really get used. For example, for our application, we always assume accumulating == true. If we are not going to maintain the code that assumes accumulating == false, we may want to delete them. I personally would prefer getting the code as simpler as possible

Yes, I think it a good idea to remove the Default mode and keep only Accumulating mode, it will simplify the code a lot. Should I have another PR for cleaning up this part, or a commit in this PR?

You can add the clean up in this PR itself.

@cdevadas
Copy link

Also please make sure all the code comments are up to date with the code changes. For example, any comment mentioning PHI node is likely out of date.

Should we also remove the SSAReconstructor class in AMDGPUWaveTransform.cpp since that is not needed anymore?

How about the second part of the SSAReconstructor that deals with the dominance relation between defs and their respective uses?" Keep it for now. Anyway, we disabled the SSAReconstructor.run() invocation for now. Let's see if there is any fixup needed later when we turn on the wave-transform pipeline by default.

@z1-cciauto
Copy link
Collaborator

Failed to trigger build:

@cmc-rep
Copy link

cmc-rep commented Dec 10, 2025

Cleanup looks good to me.

In terms of testing, I was suggesting that we first add run-line for those LL tests to STOP-BEFORE wave-transform in a separate PR, I expect that should works (not crashing). If we can add more tests for more control-flow situations, that would be even better.

In this PR, we should try to turn those STOP-BEFORE into STOP-after. We got multiple people here to examine those test results to ensure correctness, which should be a good and healthy exercise.

@skganesan008
Copy link
Collaborator

!PSDB

@z1-cciauto
Copy link
Collaborator

SSAUpdater.AddAvailableValue(
Info.Block,
(Info.Value && !(Info.Flags & ResetAtEnd)) ? Info.Merged : ZeroReg);
if(!Info.Value || (Info.Flags & ResetAtEnd))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to discuss further optimization here.

GCNLaneMaskUpdater::process() will process the BlockInfo for the following blocks:
X - The block for which we are computing EXEC mask
R - Set of preds of X in Reconverged CFG
T - Set of preds of X in Thread-level CFG

Info.Value is set for all blocks in T (via GCNLaneMaskUpdater::addAvailable() called from ControFlowRewriter::rewrite() )
ResetAtEnd is set for all blocks in R (via GCNLaneMaskUpdater::addReset() called from ControFlowRewriter::rewrite() )

The SSAUpdater marks the ZeroReg or MergedReg as available on the condition:
(Info.Value && !(Info.Flags & ResetAtEnd)) ? Info.Merged : ZeroReg

which translates to:
SSAUpdater.addAvailableValue(x, MergedReg) for x \in T and \notin R
SSAUpdater.addAvaialbleValue(x, ZeroReg) for x \in R UNION (x \notin R and \notin T)

The NonSSA approach uses a single Accumulator Register to store the contributions from each block in T instead of mulitple Merged Register beign defined. This Accumulator is reset at end of blocks corresponding to where SSAUpdater orignally marked ZeroRegister as available.

Therefore, we add Accumulator reset to 0 instructions at end of block : (x \in R) UNION (x \notin R and \notin T)

I believe we can reduce this set further to just x \in R.
This should work because (x \notin R and \notin T) when not empty, corresponds to block X such that X \notin R and X \notin T.

X is directly preceded by blocks in R in the reconverged CFG.
Blocks in R will have Accumulator reset instruction at their end.
Therefore adding Accumulator reset instruction at end of X is redundant.

Kindly let me know if this logic seems sound.
RefinedConditionForAccReset

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to reset at the end of X when X is in the loop. I am not sure.

@lalaniket8 lalaniket8 force-pushed the amd/dev/lalaniket8/non-ssa-compliant-wave-transform branch from 0cd0d8d to 0d48750 Compare January 9, 2026 13:17
@z1-cciauto
Copy link
Collaborator

@cmc-rep
Copy link

cmc-rep commented Jan 11, 2026

I am half-way through checking wavetransform-partial-join.mir. So far so good. Not yet get to wavetransform-natural-loops.mir yet. It is going to take some time.
I feel that we need at least another person to exemine these results at the same time. Getting wave-transform right in the 1st place is important. Detecting bugs in this area will be tricky later on.

@lalaniket8
Copy link
Author

I am half-way through checking wavetransform-partial-join.mir. So far so good. Not yet get to wavetransform-natural-loops.mir yet. It is going to take some time. I feel that we need at least another person to exemine these results at the same time. Getting wave-transform right in the 1st place is important. Detecting bugs in this area will be tricky later on.

Thank you @cmc-rep ! I can start verifying wavetransform-natural-loops.mir

@cmc-rep
Copy link

cmc-rep commented Jan 15, 2026

All MIR test results look good to me. I intend to approve this PR. Somehow, i cannot find approval button for me on the github page?

@cmc-rep cmc-rep self-requested a review January 15, 2026 20:11
@cmc-rep
Copy link

cmc-rep commented Jan 15, 2026

After this PR, we shall talk about how to optimize those scalar instructions added by wave-transform

@lalaniket8
Copy link
Author

wavetransform-natural-loops.mir is also correct.

@lalaniket8 lalaniket8 force-pushed the amd/dev/lalaniket8/non-ssa-compliant-wave-transform branch from 0d48750 to 66055fa Compare February 1, 2026 15:54
@z1-cciauto
Copy link
Collaborator

Copy link

@cdevadas cdevadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM except for the comments.

@lalaniket8 lalaniket8 force-pushed the amd/dev/lalaniket8/non-ssa-compliant-wave-transform branch from 66055fa to fc18f63 Compare February 2, 2026 10:35
@z1-cciauto
Copy link
Collaborator

@lalaniket8 lalaniket8 merged commit fa0dbde into amd-feature/wave-transform Feb 2, 2026
5 checks passed
@lalaniket8 lalaniket8 deleted the amd/dev/lalaniket8/non-ssa-compliant-wave-transform branch February 2, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants